Language model size reduction by pruning and clustering
نویسندگان
چکیده
Several techniques are known for reducing the size of language models, including count cutoffs [1], Weighted Difference pruning [2], Stolcke pruning [3], and clustering [4]. We compare all of these techniques and show some surprising results. For instance, at low pruning thresholds, Weighted Difference and Stolcke pruning underperform count cutoffs. We then show novel clustering techniques that can be combined with Stolcke pruning to produce the smallest models at a given perplexity. The resulting models can be a factor of three or more smaller than models pruned with Stolcke pruning, at the same perplexity. The technique creates clustered models that are often larger than the unclustered models, but which can be pruned to models that are smaller than unclustered models with the same perplexity.
منابع مشابه
Cross-Validation and Minimum Generation Error based Decision Tree Pruning for HMM-based Speech Synthesis
This paper presents a decision tree pruning method for the model clustering of HMM-based parametric speech synthesis by cross-validation (CV) under the minimum generation error (MGE) criterion. Decision-tree-based model clustering is an important component in the training process of an HMM based speech synthesis system. Conventionally, the maximum likelihood (ML) criterion is employed to choose...
متن کاملImproving Language Model Size Reduction using Better Pruning Criteria
Reducing language model (LM) size is a critical issue when applying a LM to realistic applications which have memory constraints. In this paper, three measures are studied for the purpose of LM pruning. They are probability, rank, and entropy. We evaluated the performance of the three pruning criteria in a real application of Chinese text input in terms of character error rate (CER). We first p...
متن کاملDiscriminative Pruning of Language Models for Chinese Word Segmentation
This paper presents a discriminative pruning method of n-gram language model for Chinese word segmentation. To reduce the size of the language model that is used in a Chinese word segmentation system, importance of each bigram is computed in terms of discriminative pruning criterion that is related to the performance loss caused by pruning the bigram. Then we propose a step-by-step growing algo...
متن کاملSpeech Recognition of Czech-Inclusion of Rare Words Helps
Large vocabulary continuous speech recognition of inflective languages, such as Czech, Russian or Serbo-Croatian, is heavily deteriorated by excessive out of vocabulary rate. In this paper, we tackle the problem of vocabulary selection, language modeling and pruning for inflective languages. We show that by explicit reduction of out of vocabulary rate we can achieve significant improvements in ...
متن کاملEffect of pruning on growth, development, seed yield and active substances of Pumpkin (Cucurbita pepo convar. pepo var. styriaca)
The objective of this study was to investigate the effect of pruning in different developmental stages on growth, development, seed yield and active substances of medicinal pumpkin (these active substances are uses for remedy the Benign Prostatic Hyperplasia (BPH)). The experiment was performed in a RCB design. Five pruning treatments in different developmental stages (no pruning, after 3-5 nod...
متن کامل